Search CORE

127 research outputs found

Mining Electronic Health Records to Validate Knowledge in Pharmacogenomics

Author: Coulet Adrien
Smaïl-Tabbone Malika
Publication venue: ERCIM
Publication date: 01/01/2016
Field of study

International audienceThe state of the art in pharmacogenomics (PGx) is based on a bank of knowledge resulting from sporadic observations, and so is not considered to be statistically valid. The PractiKPharma project is mining data from electronic health record repositories, and composing novel cohorts of patients for confirming (or moderating) pharmacogenomics knowledge on the basis of observations made in clinical practice

INRIA a CCSD electronic archive server

Prédiction de défauts dans les arbres du parc végétal Grenoblois et préconisations pour les futures plantations

Author: Dalleau Kevin
PER Yelen
Smaïl-Tabbone Malika
Publication venue: HAL CCSD
Publication date: 23/01/2017
Field of study

National audienceNous décrivons dans cet article notre réponse au défi EGC 2017. Une analyse exploratoire des données a tout d’abord permis de comprendre les distributions des différentes variables et de détecter de fortes corrélations. Nous avons défini deux variables supplémentaires à partir des variables du jeu de données. Plusieurs algorithmes de classification supervisée ont été expérimentés pour répondre à la tâche numéro 1 du défi. Les performances ont été évaluées par validation croisée. Cela nous a permis de sélectionner les meilleurs classifieurs uni-label et multi-label. Autant sur la tâche uni-label que multi-label, le meilleur classifieur dépasse les références d’environ 2%. Nous avons également exploré la tâche numéro 2 du défi. D’une part, des règles d’association ont été recherchées. D’autre part, le jeu de données a été enrichi avec des connaissances telles que des données climatiques (pluviométrie, température, vent) ou des données taxonomiques dans le domaine de la botanique (famille, ordre, super-ordre). En outre, des données géographiques et cartographiques sont exploitées dans un outil de visualisation d’une partie des données sur les arbres

INRIA a CCSD electronic archive server

Unsupervised Extra Trees: a stochastic approach to compute similarities in heterogeneous data.

Author: Couceiro Miguel
Dalleau Kevin
Smaïl-Tabbone Malika
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/03/2020
Field of study

International audienceIn this paper we present a method to compute similarities on unlabeled data, based on extremely randomized trees. The main idea of our method, Unsu-pervised Extremely Randomized Trees (UET) is to randomly split the data in an iterative fashion until a stopping criterion is met, and to compute a similarity based on the co-occurrence of samples in the leaves of each generated tree. Using a tree-based approach to compute similarities is interesting, as the inherent We evaluate our method on synthetic and real-world datasets by comparing the mean similarities between samples with the same label and the mean similarities between samples with different labels. These metrics are similar to intracluster and intercluster similarities, and are used to assess the computed similarities instead of a clustering algorithm's results. Our empirical study shows that the method effectively gives distinct similarity values between samples belonging to different clusters, and gives indiscernible values when there is no cluster structure. We also assess some interesting properties such as in-variance under monotone transformations of variables and robustness to correlated variables and noise. Finally , we performed hierarchical agglomerative clustering on synthetic and real-world homogeneous and heterogeneous datasets using UET versus standard similarity measures. Our experiments show that the algorithm outperforms existing methods in some cases, and can reduce the amount of preprocessing needed with many real-world datasets

INRIA a CCSD electronic archive server

Clustering graphs using random trees

Author: Couceiro Miguel
Dalleau Kevin
Smaïl-Tabbone Malika
Publication venue: HAL CCSD
Publication date: 09/09/2019
Field of study

In this work-in-progress paper, we present GraphTrees, a novel method that relies on random decision trees to compute pairwise distances between vertices in a graph. We show that our approach is competitive with the state of the art methods in the case of non-attributed graphs in terms of quality of clustering. By extending the use of an already ubiquitous approach-the random trees-to graphs, our proposed approach opens new research directions, by leveraging decades of research on this topic

INRIA a CCSD electronic archive server

An Experimental Evaluation of Similarity-Based and Embedding-Based Link Prediction Methods on Graphs

Author: Aridhi Sabeur
Islam Md Kamrul
Smaïl-Tabbone Malika
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 30/09/2021
Field of study

International audienceThe task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)based methods and heuristic ones as a means to alleviate the black-box well-known limitation

INRIA a CCSD electronic archive server

Extraction de données pharmacogénomiques à partir d'études cliniques : problématique

Author: Coulet Adrien
Devignes Marie-Dominique
Smaïl-Tabbone Malika
Publication venue: HAL CCSD
Publication date: 18/01/2005
Field of study

L'importance des variations individuelles dans les réactions aux médicaments devient un problème conséquent à la fois au niveau de la recherche pharmaceutique et au niveau médical. Notre projet de recherche vise à intégrer des données cliniques et génétiques issues d'études cliniques avec comme objectif d'en extraire une connaissance sur les relations existantes entre un génotype particulier et son action sur l'effet d'un médicament. Pour répondre à ce problème, nous cherchons des méthodes de fouille adaptées aux données biomédicales que nous souhaitons manipuler et capables d'intégrer les connaissances du domaine sous forme d'ontologie. Ce projet est l'objet d'une thèse qui a commencé en novembre 2004

INRIA a CCSD electronic archive server

Kbdock - Searching and organising the structural space of protein-protein interactions

Author: Devignes Marie-Dominique
Ritchie David
Smaïl-Tabbone Malika
Publication venue: ERCIM
Publication date: 15/01/2016
Field of study

International audienceBig data is a recurring problem in structural bioinformatics where even a single experimentally determined protein structure can contain several different interacting protein domains and often involves many tens of thousands of 3D atomic coordinates. If we consider all protein structures that have ever been solved, the immense structural space of protein-protein interactions needs to be organised systematically in order to make sense of the many functional and evolutionary relationships that exist between different protein families and their interactions. This article describes some new developments in Kbdock, a knowledge-based approach for classifying and annotating protein interactions at the protein domain level

INRIA a CCSD electronic archive server

NRPS toolbox for the discovery of new nonribosomal peptides and synthetases

Author: Devignes Marie-Dominique
Jacques Philippe
Leclère Valérie
Pupin Maude
Smaïl-Tabbone Malika
Publication venue: HAL CCSD
Publication date: 03/07/2012
Field of study

National audienceNonribosomal peptide synthetases are huge multi-enzymatic complexes synthesizing peptides, but not through the classical process of transcription and then translation. The synthetases are organised in modules, each one integrating an amino acid in the final peptide. The modules are divided in domains providing specialized activities. So, those enzymes are as diverse as their products. We present our toolbox designed to annotate them accurately and promising results obtained on some Burkholderia, Bacillus and Pseudomonas genomes

HAL - Lille 3

INRIA a CCSD electronic archive server

Formal Concept Analysis Applied to Transcriptomic Data

Author: Alam Mehwish
Coulet Adrien
Napoli Amedeo
Smaïl-Tabbone Malika
Publication venue: HAL CCSD
Publication date: 27/08/2012
Field of study

International audienceIdentifying functions or pathways shared by genes responsible for cancer is still a challenging task. This paper describes the preparation work for applying Formal Concept Analysis (FCA) to biological data. After gene transcription experiments, we integrate various annotations of selected genes in a database along with relevant domain knowledge. The database subsequently allows to build formal contexts in a flexible way. We present here a preliminary experiment using these data on a core context with the addition of domain knowledge by context apposition. The resulting concept lattices are pruned and we discuss some interesting concepts. Our study shows how data integration and FCA can help the domain expert in the exploration of complex data

INRIA a CCSD electronic archive server

HAL-Rennes 1

BR-Explorer: A sound and complete FCA-based retrieval algorithm (Poster)

Author: Devignes Marie-Dominique
Messai Nizar
Napoli Amedeo
Smaïl-Tabbone Malika
Publication venue: HAL CCSD
Publication date: 13/02/2006
Field of study

In this paper we present BR-Explorer, a sound and complete biological data sources retrieval algorithm based on Formal Concept Analysis and domain ontologies. BR-Explorer addresses the problem of retrieving the relevant data sources for a given query. Initially, a formal context representing the relation between biological data sources and their metadata is provided and its corresponding concept lattice is built. Then BR-Explorer starts by generating the formal concept for the considered query and inserting it into the provided concept lattice. The next step of BR-Explorer is to locate the "pivot" concept in the resulting concept lattice. Based on this pivot concept, BR-Explorer builds the result step by step by considering the pivot superconcepts in the resulting concept lattice until the top concept is reached. Finally BR-Explorer provides the set of relevant data sources ranked according to their relevance with respect to the considered query. An ontology-based query refinement procedure is integrated in BR-Explorer. This procedure takes advantage of semantic information about the data source metadata and the queries to improve the BR-Explorer results

INRIA a CCSD electronic archive server